Fitted value function iteration with probability one contractions
نویسندگان
چکیده
منابع مشابه
Convergent Fitted Value Iteration with Linear Function Approximation
Fitted value iteration (FVI) with ordinary least squares regression is known to diverge. We present a new method, “Expansion-Constrained Ordinary Least Squares” (ECOLS), that produces a linear approximation but also guarantees convergence when used with FVI. To ensure convergence, we constrain the least squares regression operator to be a non-expansion in the∞-norm. We show that the space of fu...
متن کاملSampling Based Fitted Value Iteration Finite-Time Bounds for Sampling-Based Fitted Value Iteration
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decision problems (MDPs) under the assumption that a generative model of the environment is available. Our main results come in the form of finite-time bounds on the performance of two versions of sampling-based FVI. The con...
متن کاملFinite-Time Bounds for Fitted Value Iteration
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decision processes (MDPs) under the assumption that a generative model of the environment is available. Our main results come in the form of finite-time bounds on the performance of two versions of sampling-based FVI. The co...
متن کاملOptimizing spoken dialogue management with fitted value iteration
In recent years machine learning approaches have been proposed for dialogue management optimization in spoken dialogue systems. It is customary to cast the dialogue management problem into a Markov Decision Process (MDP) and to find the associated optimal policy using Reinforcement Learning (RL) algorithms. Yet, the dialogue state space is usually very large (even infinite) and standard RL algo...
متن کاملBoosted Fitted Q-Iteration
This paper is about the study of B-FQI, an Approximated Value Iteration (AVI) algorithm that exploits a boosting procedure to estimate the action-value function in reinforcement learning problems. B-FQI is an iterative off-line algorithm that, given a dataset of transitions, builds an approximation of the optimal action-value function by summing the approximations of the Bellman residuals acros...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Economic Dynamics and Control
سال: 2013
ISSN: 0165-1889
DOI: 10.1016/j.jedc.2012.08.003